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CONTROL  OF  A  SPACESHIP 

JOHN  BATHER*  and  HERMAN  CHERNOFF 
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1.  Introduction  and  summary 

Imagine  a  spaceship  travelling  towards  a  certain  planet  with  predetermined 
speed,  in  a  direction  which  will  bring  it  close  to  the  target  after  a  known  period 
of  time.  Observations  on  the  position  of  the  target,  relative  to  the  present  course, 
are  made  continuously  and  lead  to  a  gradually  improving  prediction  of  the  even¬ 
tual  miss  distance.  On  the  other  hand,  the  fuel  available  in  the  spaceship  for 
making  minor  changes  in  the  direction  of  motion,  is  gradually  losing  its  effective¬ 
ness.  This  is  because  the  final  change  of  position  caused  by  a  small  velocity  im¬ 
posed  perpendicular  to  the  present  motion,  is  roughly  proportional  to  the  re¬ 
maining  time.  Thus  we  have  a  control  problem  which  is  essentially  one  of  com¬ 
promise  between  the  extremes  of  using  the  fuel  early  and  perhaps  in  the  wrong 
way,  because  of  poor  information;  or  waiting  too  long  for  more  precise  infor¬ 
mation,  so  that  the  fuel  becomes  ineffective. 

The  statistical  decision  problem  considered  here  actually  arises  from  a  simpli¬ 
fied  formulation  of  the  above  question,  but  one  which  contains  its  main  features. 
We  suppose  first  that  the  motion  of  the  spaceship  relative  to  its  target  is  confined 
to  a  fixed  plane  with  the  target  as  origin.  The  horizontal  component  of  velocity 
is  fixed  as  unity,  so  that  the  time  coordinate  r  <  0,  also  represents  the  horizontal 
distance  to  be  travelled  before  the  target  is  passed.  It  is  enough  to  represent  the 
vertical  components  of  position  and  velocity  together  by  /x,  the  height  at  which 
the  present  line  of  motion  meets  the  axis  r  =  0.  However,  /x  is  unknown,  and 
must  be  estimated  continuously  by  observing  a  certain  stochastic  process 
(PT(t);  t  <  0}  whose  mean  drift  is  n  per  unit  time. 

A  second  fiction,  which  will  be  maintained  throughout  the  present  paper,  is 
that  an  infinite  quantity  of  fuel  is  available  for  adjusting  the  vertical  velocity, 
and  hence  /x,  at  a  fixed  price  c  per  unit  change  of  velocity.  Thus  at  any  time  r, 
an  instantaneous  velocity  increment  A,  costing  cA,  will  change  the  unknown 
quantity  n  by  a  known  amount  A|rj.  The  problem  is  to  find  a  control  procedure 
which  minimizes  the  sum  of  all  fuel  costs  together  with  a  cost  associated  with 
the  final  miss.  For  the  most  part,  we  shall  assume  that  this  terminal  cost  is  given 
by  |/c/x2.  Because  of  the  symmetry  of  this  function,  the  direction  of  any  control 
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applied  is  always  determined  by  the  sign  of  the  current  estimate  of  y.  The 
important  question  is  how  large  this  estimate  must  be  before  it  is  advantageous 
to  take  some  action. 

Section  2  is  concerned  with  finding  a  convenient  description  of  the  state  of  the 
system.  Under  the  assumption  that  the  information  process  { W(t )}  is  Gaussian 
with  suitable  initial  conditions,  the  posterior  distribution  of  y  has  the  normal 
form  9 l(y,  s ),  where  the  variance  s  is  a  strictly  increasing  function  of  —  t:  ap¬ 
proximately  linear.  Then  by  transforming  the  time  parameter,  we  can  conven¬ 
iently  regard  s  as  the  time  to  go.  The  corresponding  mean  y  is  determined,  after 
observing  a  related  stochastic  process  (F(s')}  during  s'  >  s,  simply  by  y  =  F(s). 
It  turns  out  that  {F(s);  s  >  0}  is  a  standard  Wiener  process  in  the  (— s)  scale. 
In  other  words,  given  any  position  ( y ,  s),  if  no  controlling  action  is  taken  before 
s  falls  to  s  —  5,  then  a  new  position  (y  -f-  8Y,  s  —  8)  arises  in  which  8Y  = 
Y(s  —  8)  —  F(s)  has  the  conditional  distribution  31(0,  8). 

Both  actions  and  costs  are  easily  translated  into  the  ( y ,  s)  coordinate  system. 
An  action  is  represented  by  instantaneous  change  in  the  value  of  y,  and  a  shift 
Ay  in  either  direction  incurs  a  cost  D(s)  \Ay\,  where  D(s )  is  a  specified  positive 
function.  The  terminal  cost,  incurred  when  any  position  ( y ,  0)  is  reached,  is 
given  in  general  by  a  function  R(y,  0)  which  is  easily  evaluated  since,  at  this 
final  stage  y  =  y  is  known.  In  the  special  case  mentioned  previously,  suitable 
changes  of  scale  lead  to  a  standard  form  in  which  R(y,  0)  =  \y2  and  D(s)  =  1/s. 

Since  the  pair  ( y ,  s)  always  provides  a  complete  description  of  the  state  of 
the  system,  it  follows  that  in  seeking  an  optimal  control  procedure  to  minimize 
costs,  we  may  restrict  attention  to  policies  which  depend  only  on  these  coordi¬ 
nates.  In  effect,  we  must  classify  each  point  {y,  s)  in  the  half-plane  s  >  0, 
according  to  whether  or  not  some  action  is  involved  when  that  position  arises. 

For  a  discrete  time  variation  of  this  problem  where  fuel  could  be  used  only  at 
certain  specified  times,  C.  T.  Striebel  and  F.  Tung  [9]  used  dynamic  program¬ 
ming  techniques  to  show  that  an  optimal  procedure  can  be  expressed  in  terms  of 
a  boundary  y(s)  as  follows:  if  s  corresponds  to  an  allowable  action  time  and 
\y\  >  y(s),  use  fuel  to  go  to  (sgn  y)y(s).  Otherwise,  do  not  use  fuel. 

This  result  clearly  indicates  that  the  solution  of  the  continuous  time  version 
of  the  problem  has  the  same  form.  It  is  now  also  clear  that  variations  of  the 
problem  with  different  and  possibly  asymmetric  terminal  costs  R(y,  0)  would 
lead  to  similar  solutions.  That  is,  the  optimal  policy  corresponds  to  an  action 
region  a.  If  ( y ,  s)  is  inside  the  action  region,  fuel  must  be  instantaneously  applied 
to  bring  ( y ,  s)  to  an  appropriate  point  of  the  waiting  region  tt,  which  is  the 
complement  of  <2.  This  characterization  is  also  suggested  by  results  of  R.  T. 
Orford  [7]  on  a  related  problem. 

The  optimal  boundary  curves  d=y(s)  can  be  determined,  in  principle,  in  terms 
of  the  Bayes  risk  function  R(y,  s),  which  represents  the  minimum  expected  cost 
incurred  when  one  starts  at  the  point  (y,  s).  The  properties  of  R(y,  s )  and  the 
fact  that  the  optimal  procedure  corresponds  to  the  solution  of  a  Free  Boundary 
Problem  are  discussed  in  section  3.  C.  T.  Striebel  [8]  independently  and  previ- 
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ously  derived  the  “necessity”  of  these  free  boundary  conditions  in  the  sense  that 
conditions  are  presented  under  which  there  is  an  optimal  procedure  that  satisfies 
the  free  boundary  problem.  The  effect  of  a  policy  determined  by  boundary 
curves  ±y(s)  is  to  constrain  the  process  (F(s)}  so  that  its  trajectory  always  lies 
within  or  on  the  boundary  of  the  region  Q.  To  see  how  this  works,  let  us  con¬ 
sider  only  the  modifications  imposed  at  the  upper  boundary.  Conditional  on 
F (s0)  =  yo,  for  any  initial  position  (y0,  s0 )  we  define  the  process 

(1.1)  M(si)  =  max  [0,  sup  (F(s)  —  £(«))],  (si  <  s0). 

80  >8  >81 

This  represents  the  cumulative  effect  of  suppressing  the  original  path  below  the 
curve  y(s)  and  would  lead  to  a  position  (yh  Si),  where  y\  =  F($i)  —  M(si). 

It  is  instructive  to  regard  the  policy  as  the  limit  of  a  sequence  of  restrictions 
to  discrete  time  when  the  corresponding  time  intervals  approach  zero.  Here  each 
member  of  the  sequence  is  defined  by  a  discrete  set  of  values  of  s  and  actions, 
determined  by  the  critical  levels  d=  y(s),  are  allowed  only  at  the  specified  instants. 
Of  course,  when  actions  are  restricted  in  this  way,  the  results  are  suboptimal, 
but  any  discrete  time  formulation  can  be  considered  in  its  own  right.  The 
relation  between  the  two  approaches  is  illustrated  by  the  fact  that  sequences  of 
optimal  discrete  time  policies  and  the  associated  risk  functions,  converge  in  a 
natural  way  to  their  continuous  counterparts. 

The  last  assertion  can  be  justified  by  reference  to  similar  results  which  have 
been  established  for  sequential  tests  of  a  normal  mean  [5] :  a  problem  which  is  in 
a  certain  sense,  equivalent  to  ours.  The  relevant  characteristics  of  the  testing 
problem,  originally  described  in  [3],  are  illustrated  in  the  following  stopping 
problem. 

This  rather  artificial  version  of  the  testing  problem,  where  the  s-axis  forms  one 
boundary,  will  be  associated  with  our  present  control  problem.  We  restrict  at¬ 
tention  to  the  quadrant  y,  s  >  0  of  the  plane.  Changes  of  position  within  this  re¬ 
gion  occur  exactly  as  before,  according  to  the  process  {F(s)},  but  termination 
may  occur  in  any  one  of  three  ways:  (1)  if  a  position  (0,  s)  is  reached,  the  process 
stops  automatically  without  cost;  (2)  similarly,  the  process  must  stop  at  any  po¬ 
sition  (y,  0),  and  a  cost  Ry,  (y,  0)  is  incurred;  and  (3)  in  any  other  position  ( y ,  s), 
not  on  either  axis,  it  may  be  elected  to  stop  and  pay  a  specified  amount  D(s). 

In  this  problem,  the  optimal  risk  function  V (y,  s )  determines  the  optimal  policy 
in  a  very  simple  way.  Any  point  (y,  s )  must  be  assigned  to  the  stopping  or  action 
region  if  V(y,s )  =  D(s)  and  to  the  continuation  region  if  V (y,  s)  <  D(s).  In  par¬ 
ticular,  the  strict  inequality  indicates  that  there  is  a  policy  which  achieves  a 
definite  advantage  over  the  given  stopping  cost,  at  the  particular  position  (y,  s ). 

It  will  be  shown  in  section  4  that  V ( y ,  s)  and  the  derivative  Ru(y,  s )  of  the 
original  risk  function,  restricted  toy,  s  >  0,  are  determined  by  precisely  the  same 
properties  and  may  be  identified.  Thus,  the  same  curve  y(s)  defines  the  optimal 
policy  for  both  problems. 

With  this  interpretation  of  Rv{y,  s ),  the  general  techniques  discussed  in  [1], 
can  be  applied  to  find  approximations  to  the  unknown  boundary.  In  spite  of 
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their  formal  similarity,  there  is  a  practical  difference  between  the  two  problems. 
For  example,  the  restriction  to  y  >  0  is  unnatural  for  the  original  testing  problem 
and  such  cost  functions  as  D(s)  =  1/s  and  Ry(y,  0)  =  y  would  be  inappropriate. 
It  is  perhaps  more  accurate  to  describe  the  minimization  of  V ( y ,  s)  as  a  stopping 
problem.  Then  it  is  natural  to  refer  to  the  upper  halves  of  <2  and  12  as  the  optimal 
stopping  and  continuation  regions. 

Section  5  is  concerned  with  obtaining  specific  inner  and  outer  approximations 
to  the  optimal  boundary  curve  y(s),  for  the  above  special  case.  As  a  consequence, 
y(s )  is  determined  within  fairly  narrow  limits  for  all  values  of  s.  In  particular,  it 
is  deduced  that 


(1.2) 

y(s )  =  -  +  0(s2), 

S 

0- 

0), 

(1.3) 

y(s )  =  s112  +  0(s-1/2), 

0- 

->  oo). 

The  main  object  of  the  remaining  sections  is  to  indicate  techniques  which  can 
be  used  to  refine  these  asymptotic  bounds.  One  such  refinement  shows  that 
y(s)  =  Is2  -f-  o(s2),  (s  — >  0),  and  this  is  supplemented  by  a  formal  expansion 
which  gives  y(s)  =  1/s  +  \s2  —  is5  -f  £s8  -j-  •  •  •  .  For  s  — >  «,  we  establish  that 
y(s)s~112  —  1  is  roughly  of  the  order  s-,!0,  where  rjo  =  1.61005. 

Our  approach  throughout  is  based  on  comparisons  between  the  given  problem 
and  certain  auxiliary  stopping  problems  for  which  the  solution  is  known.  For 
example,  the  treatment  of  the  case  s  — >  0  is  closely  related  to  the  solution  as 
s— >oo  of  another  optimal  stopping  problem,  defined  by  the  stopping  cost 
d(y,  s )  =  —  s  for  s  >  0,  and  with  terminal  cost  on  the  s-axis,  d{y,  0)  =  min  (y,  0). 
In  section  6  it  is  shown  that  the  optimal  boundary  for  this  auxiliary  problem  is 
z(s )  =  — s  +  |  +  o(l)  as  $— >oo  and  the  corresponding  minimum  risk  is 
v(y,  s)  =  y  —  |  exp  (2 y  +  2s  —  1)  +  o(l)  in  the  continuation  region.  This  re¬ 
sult  is  applied  to  give  y(s)  =  s~l  -j-  |s2  +  o(s2)  as  s  — >  0.  The  same  ideas  moti¬ 
vate  the  formal  expansion  for  y(s)  in  section  7. 

The  refined  inner  and  outer  bounds  for  s  — >  °o  are  presented  in  section  8.  We 
rely  on  two  facts.  In  the  first  place  u(y,  s )  =  s~x,2aF {(X  +  l)/2,  f,  —  a2/ 2}  with 
a  =  ys~112,  is  a  solution  of  the  basic  diffusion  equation  satisfied  by  V(y,  s ). 
Here  F  is  the  confluent  hypergeometric  function.  Second,  there  is  a  number 
X0  =  2770  +  2,  which  is  the  smallest  X  >  1  such  that  u  vanishes  when  a  =  1. 

In  the  final  section,  assumptions  slightly  different  from  those  of  section  2  lead 
to  a  stopping  risk  of  the  form  D(y,  s)  =  s-1  +  a  for  y  >  0,  s  >  0.  In  this  case,  a 
formal  expansion  of  the  type  described  in  [4],  [6],  is  initiated  to  show  that 
y(s)s~112  ~  (2  log  s)1/2  as  s  — >  oo . 

From  the  practical  point  of  view,  our  assumption  that  there  is  an  infinite 
quantity  of  fuel  available  in  the  spaceship  is  unsatisfactory.  The  assumption  can 
be  relaxed  at  the  expense  of  dealing  with  a  third  variable.  We  have  found  that 
the  present  approach  can  still  be  applied  in  a  fairly  straightforward  manner.  It  is 
hoped  that  these  developments  will  be  discussed  later. 

The  authors  wish  to  thank  J.  V.  Breakwell  for  introducing  them  to  this  prob- 
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lem,  and  Herman  Rubin  for  the  benefit  of  considerable  discussion.  In  particular, 
Rubin  pointed  out  that  770  exists  and  computed  its  value. 


2.  Preliminaries 


In  studying  the  information  process,  we  may  imagine  that  n  is  a  fixed  parame¬ 
ter.  Any  controls  are  easily  taken  into  account,  since  they  lead  to  corresponding 
translations  of  ju.  We  suppose  that  initially,  n  has  a  normal  prior  distribution 
denoted  by  91(2/0,  s0),  and  that  the  observed  process  (JF(t);  t0  <  r  <  0}  has 
independent  normal  increments  with  mean  y  and  known  variance  c t2(t)  per  unit 
time.  The  situation  is  analogous  to  one  in  which  a  succession  of  independent 
normal  observations  have  a  common  unknown  mean  and  arbitrary  known  vari¬ 
ances.  In  that  case,  it  is  not  difficult  to  verify  that  the  posterior  distribution  is 
normal  at  every  stage,  and  a  similar  result  holds  here,  where  observation  takes 
place  continuously.  Thus  the  information  accumulated  up  to  the  instant  r,  can 
be  summarized  in  a  posterior  distribution  of  the  form  9 l(y,  s).  Let  us  consider 
briefly  how  the  parameters  y(r)  and  s(r)  behave. 

It  is  enough  to  represent  the  results  W (r') ;  r  <  r'  <  t  +  Sr,  of  a  very  short 
period  of  observation  by  the  final  value  or  equivalently  by  the  total  increment 
8W.  For  example,  this  would  certainly  be  valid  for  a  constant  variance  function 
<r2(r/)  since  8W  would  then  constitute  a  sufficient  statistic  from  the  new  obser¬ 
vations.  We  must  investigate  the  joint  distribution  of  y  and  8W,  where  the 
marginal  for  y  is  given  by  the  pair  ( y ,  s).  Then  by  finding  the  conditional  distri¬ 
bution  of  y  given  <5  IF,  which  is  the  new  posterior  distribution  at  r  +  5r,  the 
increments  8y  and  8s  can  be  evaluated.  This  calculation  leads  to  the  following 
equations. 


(2.1) 


8t 


+  o(8t), 


(2.2)  8y  =  (8 W  —  y8r)  +  o(8t). 

The  first  corresponds  to  a  simple  differential  equation  for  s(t),  and  the  solution  is 


(2.3) 


1  _  1  _|_  fT  du 

s  So  JT0  a2(u) 


This  determines  s  as  a  monotone  strictly  decreasing  function  of  r.  On  the  other 
hand,  y  changes  stochastically.  We  replace  8y  by  8Y,  in  order  to  represent  the 
increment  as  a  random  variable,  conditional  on  the  information  available  at 
time  r.  With  this  conditioning,  8W  can  be  expressed  as 


8W  —  y8r  +  aei(8r)112 


=  (y  +  S1/2e2)5r  +  o-€i(5t)1/2, 

where  ei,  e2  are  independent  standard  normal  variates  and  it  follows  from  (2.2) 
and  (2.1)  that  8Y  has  the  distribution  91(0,  -8s).  Thus  we  have  a  representation 
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of  the  original  information  process  by  a  derived  process  (F(s)},  for  which  the 
decreasing  quantity  s  is  a  natural  index. 

Our  treatment  of  the  information  does  not  depend  on  any  special  character¬ 
istics  of  the  variance  function  <r2(r),  but  the  intended  application  suggests  a 
particular  function,  for  which  the  determination  of  the  Wiener  process  can  be 
made  more  explicit.  We  are  regarding  —  r  as  the  horizontal  distance  which 
remains,  and  it  is  reasonable  to  suppose  that  any  errors  in  observing  the  target 
will  have  standard  deviations  proportional  to  this  distance.  In  view  of  this,  it 
is  important  to  consider  the  special  case  where 

(2.4)  <t2(t)  =  ar2,  (r  <  0). 

Then,  relation  (2.3)  reduces  to 

(2.5)  s  =  -  _a^r)  0o  <  r  <  0), 


where  the  constant  6  =  a/s0  +  1  /r0.  The  function  s(r)  is  approximately  linear,  and 
exactly  so  if  s0  =  —  gt0.  It  was  assumed  earlier  that  the  initial  information  is  in 
a  convenient  normal  form,  and  we  might  pretend  further  that  the  corresponding 
variance  s0  is  such  that  6  =  0.  Alternatively,  it  is  not  unrealistic  to  suppose  that 
both  s0  and  —  r0  are  very  large  and  in  the  limit,  again  6  vanishes.  Then  we  have 


(2.6) 


s  =  —  ar, 


and  s  >  0  can  be  interpreted  as  the  time  to  go. 

The  function  y{r),  and  hence  the  observed  path  of  the  process  (F(s);s0  > 
s  >  0),  can  be  evaluated  by  solving  the  stochastic  differential  equation  (2.2). 
On  substituting  the  special  forms  (2.4)  and  (2.6),  it  becomes  8y  —  ( y/r )  8t  =  — 
(1  /t)  6W.  Here  there  is  no  difficulty,  since  W (r)  is  a.s.  continuous  and  the  solution 
is 


(2.7) 


y  =  yo  _  W(r) 

T  To  T2 


W  (w)  du. 


From  now  on,  we  shall  treat  the  control  problem  entirely  in  terms  of  the  (y,  s ) 
coordinate  system.  In  general,  let  D(s )  denote  the  cost  of  an  optional  unit  change 
in  the  value  of  y.  Any  such  action  results  in  a  corresponding  translation  of  the 
whole  future  path  {F(s');s  >  s'  >  0}.  The  terminal  cost  incurred  on  arrival 
at  any  point  on  the  y  axis  is  given  by  R{y,  0). 

One  further  preliminary  task  remains  and  this  is  to  normalize  the  specification 
of  our  main  application.  When  equation  (2.6)  holds  and  the  price  of  fuel  is  c,  it 
follows  from  the  discussion  in  section  1,  that  D(s)  =  —c/t  =  ac/s.  Also,  we  have 
R  (y,  0)  =  \ky2.  Now  consider  the  transformation 

(2.8)  y*  =  Py,  s*  =  1 82s. 

By  relating  the  scale  changes  in  this  way,  we  have  ensured  that  the  transformed 
process  (F*(s*)}  is  still  a  standard  Wiener  process.  For  example, 

(2.9)  var  ( 8Y *)  =  /32  var  (<5F)  =  0 2(—8s)  =  —8s*. 
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We  can  also  change  the  unit  of  cost  by  a  factor  7,  so  that  in  the  new  system  the 
cost  of  a  shift  Ay*  is  given  by 

(2.10)  D*(s*)  Ay*  =  7  —  Ay  =  (iy  ac  Ay*. 

s  s 

The  new  terminal  cost  is 

(2.11)  R*(y*,0)  =  yiky*  =  ^ly*K 

It  follows  by  examining  these  final  coefficients,  that  if 

(2.12)  j8  =  a~ll3c~ll3k113,  7  =  ar^cr^k-1^ 

then  D*(s*)  =  1/s*  and  R*(y*,  0)  =  \y*2. 

In  the  later  sections  it  will  be  assumed  without  loss  of  generality,  that 
ac  =  k  =  1,  but  before  we  discard  the  present  notation,  it  is  worth  mentioning 
a  consequence  of  the  scale  changes.  It  will  be  shown  that  the  optimal  policy  for 
the  normalized  version  is  determined  by  a  curve  y*{s*),  such  that  a*(s*)  — *  1  as 
s*  — >oo,  where  a*(s*)  =  y*(s*)/s*112.  From  the  practical  point  of  view,  it  is 
reasonable  to  consider  values  for  the  constants  with  k/cy>  1,  since  it  may  be 
relatively  important  to  ‘hit’  the  target.  In  this  case,  the  optimal  boundary  y(s) 
in  the  original  specification  is  given  approximately  by  y(s)  «  s1/2  for  every  s  >  0. 
More  precisely,  for  any  fixed  value  of  s, 

(2.13)  «(«)  =  a*(sar2/3(k/c)213), 

which  converges  to  1  as  k/c  — »  «,  by  the  asymptotic  result  just  quoted. 

3.  Properties  of  the  risk  function 

In  this  section,  we  consider  the  characteristic  properties  of  the  Bayes  risk 
function  R(y,  s ).  The  actual  costs  of  any  procedure  must  be  calculated  according 
to  specified  continuous  functions  D(s )  and  R(y,  0),  where  the  latter  is  symmetric 
and  convex.  In  general  for  s  >  0,  R(y,  s )  can  be  defined  as  the  infimum  for  all 
possible  control  procedures  of  the  total  expected  cost  incurred  after  starting  in 
the  position  ( y ,  s).  However,  this  local  definition  is  not  completely  satisfactory, 
since  we  are  assuming  that  there  is  a  control  procedure,  determined  by  curves 
y  =  zhy(s),  which  is  uniformly  optimal  for  every  position.  Hence,  R(y,  s )  can 
be  used  alternatively  to  denote  the  risk  function  for  a  particular  (well-behaved) 
policy.  It  will  be  assumed  further,  that  R(y ,  s)  possesses  continuous  partial 
derivatives  Ryv  and  Rs,  except  perhaps  at  points  along  the  boundary  curves.  A 
slightly  deficient  justification  for  the  heuristic  argument  will  emerge  later.  Its 
proper  foundation  depends,  as  we  shall  see,  on  the  existence  of  a  solution  with 
the  appropriate  formal  properties,  but  no  such  proof  will  be  attempted.  Even  for 
the  application  we  have  in  mind,  it  is  no  easy  task  to  find  an  explicit  solution. 
However,  the  technique  of  comparing  the  central  problem  with  similar  cases  for 
which  the  formal  solution  is  known,  and  therefore  justified,  should  leave  little 
room  for  doubt. 
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It  is  clear  that  the  optimal  risk  R(y,  s)  must  be  symmetric  in  y  and  for  the 
most  part,  we  shall  restrict  attention  to  positions  with  y,  s  >  0.  Consider  first, 
the  value  of  an  action  which  changes  y  by  an  amount  Ay.  The  original  risk,  ex¬ 
pressed  in  terms  of  the  new  position,  is  simply  D(s )  Ay  +  R(y  +  Ay,  s ),  and 
since  the  action  may  or  may  not  be  profitable,  we  have  R(y,  s )  <  D(s)  \Ay\  + 
R(y  +  Ay,  s).  It  follows  by  letting  Ay  approach  zero  through  positive  and 
negative  values  that  in  general 

(3.1)  \Rv(y,s)\<D(s). 

Again,  when  y  >  y(s)  and  the  optimal  policy  prescribes  a  shift  Ay  =  y(s)  —  y, 
we  obtain 

(3.2)  R(y,  s)  =  R(y(s),  s)  +  D(s)(y  -  y(s)). 

The  risk  function  is  linear  in  y  throughout  the  action  region,  and  Ry(y,  s)  = 
zfcD(s)  according  to  the  sign  of  y.  In  view  of  this,  the  determination  of  R(y,  s) 
depends  largely  on  its  properties  within  the  continuation  region  ft  and  at  the 
boundary. 

We  now  show  that  R  (y,  s)  is  a  solution  of  the  diffusion  equation 

(3.3)  \Ryy  =  Rs,  (M  <  y(s),  s  >  0). 

Let  (y,  s)eft  and  consider  the  transition  to  (y  +  8Y,  s  —  8)  after  a  short  period 
of  length  5.  It  follows  that  R(y,  s )  =  E[R(y  +  8Y,  s  —  8 )],  apart  from  the  possi¬ 
bility  that  some  action  is  necessary  during  the  period.  Here,  we  can  rely  on  the 
fact  that  8Y  is  distributed  as  3(1  (0,  6).  Only  terms  of  order  8  will  be  needed  in 
evaluating  the  expectation.  By  making  use  of  the  differentiability  assumptions, 
it  is  not  difficult  to  establish  the  following  expansion : 

(3.4)  R(y,  s)  =  E[R(y,  s)  +  Rv(y,  s)8Y  +  h^uviv,  s)8Y2  -  Rs(y,  s)5]  +  o(8), 
or  equivalently, 

(3.5)  0  =  Ru(y,  s)E(8Y )  +  \Rvv(y,  s)E(8Y 2)  -  Rg(y,  s)8  +  o(8). 

But  E(8Y)  =  0,  E(8Y2)  =  8,  and  since  the  coefficient  of  8  must  vanish,  the  result 
is  equation  (3.3). 

Thus,  in  order  to  determine  the  risk  function,  we  must  solve  a  differential 
equation;  but  the  boundary  is  unknown,  and  it  is  not  immediately  clear  what 
conditions  should  be  imposed  there.  Let  us  assume  a  priori  that  R(y,  s )  itself 
must  be  continuous  at  the  boundary  but  that  its  derivatives  may  have  simple 
discontinuities.  The  previous  analysis  can  be  adapted  for  positions  on  the 
boundary  curves,  but  a  more  sensitive  treatment  is  needed. 

Consider  a  fixed  point  ( y0 ,  s0)  on  the  upper  boundary  curve.  We  suppose  that 
y(s)  =  yo  +  o(s0  —  s)1/2)  as  s  — >  s0— ,  which  is  slightly  stronger  than  our 
original  continuity  assumption.  After  a  short  period  5,  the  new  position  is 
(F(s0  —  8)  —  M(s0  —  8),  s0  —  8),  where  Y(s0)  =  yo,  T(s0  —  8)  =  y0  +  8Y,  and 
M(s0  —  8)  represents  the  total  action  which  occurs.  It  will  be  enough  to  retain 
terms  of  order  8112  in  relating  the  corresponding  risks,  and  hence  we  can  approxi¬ 
mate  the  distribution  of  M(s0  —  8). 
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(3.6)  M(s0  —  8)  =  max  [0,  sup  (F(s)  —  #(s)}] 

SO  >8  >so  —  6 

=  sup  [{F(s)  -  F(s0)}  -  {y(s)  -  y(s0)}] 

So  >3  >30 —  5 

=  sup  {Y(s)  -  Y (s0)}  +  0(8^). 

80  >8  >80  —  6 

It  can  be  shown  by  symmetry  considerations  that  the  main  term  here  has  pre¬ 
cisely  the  same  distribution  as  |5F|.  In  particular, 

(3.7)  E {M (s0  -  8)}  =  £{|SF|}  +  o(S *'*)  =  +  o(8l>2). 

Now  let  Ry  =  limj/iyo  Ry(y,  s0).  By  (3.2),  the  corresponding  right  limit  is 
R£  =  jD(s0),  since  it  is  approached  through  d.  We  note  that  all  the  costs  incurred 
during  the  period  of  interest  can  be  evaluated  according  to  D(s0),  whereas  the 
final  position  must  lie  in  12,  so  that  the  corresponding  risk  involves  Ry  .  Then 

(3.8) 

R(y0,  so)  =  E[M(s0  -  8)D(s0)  +  R(Y(s0  -  8)  -  M(s0  -  8),  s0  -  8)]  +  o(8 »*) 

=  D(s0)E[M(so  -  5)]  +  E[R(y0,  s0)  +  Ry  {8Y  -  M(s0  -8)}]  +  o(8 »/*). 
On  collecting  the  terms  of  order  51/2,  we  are  left  with 

(Ry  -  D(so))E[M(s0  -  8)]  =  o(8 */*), 

and  it  follows  that  Ry  —  D  (s0) .  In  general,  Ry  is  continuous  at  the  boundary,  and 
hence  Rs  must  be  also.  In  particular, 

(3.9)  Ry  ~  D,  (y  =  y(s)). 

It  is  important  to  recognize  that  relations  (3.2),  (3.3),  and  (3.8)  do  not  depend 
in  any  way  on  the  optimality  of  the  policy.  In  fact,  all  three  can  be  applied  to 
the  risk  function  for  an  arbitrary  policy  with  specified  boundaries.  For  the 
optimal  policy,  there  is  an  extra  condition  which  will  be  useful  in  locating  the 
curve  y(s) :  the  second  derivative  Rvy  must  be  continuous  at  the  boundary  of  fi. 
This  means  that 

(3.10)  Ryy  =  0,  (y  =  ±y(s)). 

To  verify  this  necessary  condition  for  optimality,  we  again  consider  a  starting 
point  ( y0 ,  So)  with  y0  =  y(s0).  But  now,  let  us  modify  the  optimal  policy  locally. 
No  action  is  permitted  during  the  period  s0  >  s  >  sQ  —  8,  but  the  original  pro¬ 
cedure  must  be  resumed  at  s0  —  8.  The  initial  risk  for  the  modified  policy  is 
Rw(yo,  so)  =  E[R(yo  +  8Y,  s0  -  8)]. 

In  view  of  (3.9),  there  will  be  no  terms  of  order  8112  here,  and  the  expectation 
can  be  dealt  with  almost  as  before,  in  the  derivation  of  equation  (3.3).  However, 
in  this  case,  the  term  involving  8Y2  needs  special  treatment.  Let  Rfy  and  Ryv 
denote  the  one-sided  second  derivatives  evaluated  at  (yo,  s0)  by  suitable  limiting 
operations.  We  obtain 

(3.11)  R{i)(yo,  so)  =  R(yo,  so)  +  Rv(yo,  s0)E(8Y)  +  \ RyvE(8Y 2)  —  Rs(yo,  s0)8 

+  h(Rvv  ~  RyV)E(8Y2;  8Y  >  y(s0  —  8)  —  y(s0))  +  o(8). 
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Equation  (3.3)  holds  as  (y,  s0 )  approaches  (y0)  s0)  from  below,  and  continuity 
ensures  that  \RyV  =  Rs(yo,  s0).  Also,  the  restricted  expectation  can  be  replaced 
by  E(8Y2-,8Y  >  0)  +  o(8)  and  what  remains  of  the  expansion  is 

(3.12)  Rw(yo,  so)  =  R(yo,  s0)  +  \(Rh  —  RyV)8  +  o(5). 

The  modified  policy  is  suboptimal,  and  hence  R{s)(yo,  s0 )  >  R(yo,  s0)  no  matter 
what  the  value  of  5.  It  follows  that  RyV  >  RyV.  On  the  other  hand,  relation  (3.2) 
indicates  that  RyV  =  0,  but  conditions  (3.1)  and  (3.9)  together  imply  that 
Ryv  >  0.  The  result  is  a  contradiction  unless  RyV  =  RyV. 

For  a  given  boundary,  equations  (3.3)  and  (3.9)  determine  a  solution  of  the 
diffusion  equation.  In  our  problem,  the  boundary  is  not  known,  and,  hopefully, 
the  extra  boundary  condition  (3.10)  determines  the  boundary.  Equations  (3.3), 
(3.9),  and  (3.10)  are  said  to  define  a  free  boundary  problem,  and  we  have  shown 
that  a  well-behaved  solution  of  the  optimization  problem  is  a  solution  of  the  free 
boundary  problem.  Essentially  this  result  was  independently  and  previously 
derived  by  C.  T.  Striebel  [8].  Somewhat  more  crucial  to  our  applications  are  the 
conditions  discussed  in  section  4  by  which  a  solution  of  the  free  boundary  prob¬ 
lem  is  a  solution  of  the  optimality  problem. 

4.  An  associated  problem 

Having  established  the  most  useful  properties  of  R  (y,  s),  we  shall  employ 
them  in  a  rather  indirect  way.  In  section  1,  we  introduced  another  problem  with 
minimum  risk  function  V (y,  s )  for  positions  in  the  positive  quadrant.  This  prob¬ 
lem  is  conceptually  simpler,  because  a  decision  to  act  is  necessarily  final  and  the 
corresponding  instantaneous  cost  D(s )  can  be  compared  with  the  expected  cost 
of  continuing.  Hence,  the  appropriate  boundary  curve  y(s)  and  the  two  axes 
can  be  treated  as  absorbing  barriers.  There  is  no  need  to  consider  the  process 
(F(s)}  after  absorption  takes  place.  Nevertheless,  we  shall  see  that  the  two 
problems  are  equivalent  in  the  sense  that  the  same  continuation  region  is  optimal 
for  both.  For  the  moment,  let  y(s)  represent  the  optimal  policy  corresponding  to 
V(y,s). 

The  formal  properties  of  V (y,  s )  are  listed  below. 


(4.1) 

iVyy  =  V., 

(0  <  y  <  y(s), 

(4.2) 

V  =  D, 

(y  >  y(s))f 

(4.3) 

Vy  =  0, 

II 

These  conditions  are  analogous  to  (3.3),  (3.9),  and  (3.10)  respectively  and  can  be 
derived  similarly.  Notice,  however,  that  (4.2)  is  intuitively  obvious.  As  before, 
only  the  last  is  an  optimality  condition.  In  addition,  the  automatic  termination 
of  the  path  when  either  axis  is  reached,  leads  to  the  specified  costs, 

(4.4)  F(0,  s)  =  0, 

(4.5)  V(y,  0)  =  Rv(y,  0), 


(«  >  0), 
C V  >  0). 
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We  now  observe  that  all  these  properties  are  satisfied  by  the  function 
Rv(y,  s);y,  s  >  0.  Equation  (4.1)  follows  by  differentiating  (3.3),  and  the  rest 
are  directly  applicable. 

From  a  more  fundamental  point  of  view,  it  is  clear  that  a  solution  to  the  as¬ 
sociated  problem  must  provide  a  solution  to  the  original.  If  V(y,  s)  represents 
the  local  minimum  expected  cost  for  every  position  (y,  s )  in  the  positive  quad¬ 
rant,  then  the  function  R(y,  s )  obtained  by  setting  Ry(y,  s)  =  V(y,  s)  must  be 
minimal.  More  precisely, 

(4.6)  R(y,  s)  =  /J"1  R,{y',  s)  dy'  +  f‘  R,( 0,  s')  ds'  +  R( 0,  0), 

f\y\ 

(4.7)  R(y,  s )  =  jo  V(y',  s)  dy '  + 

Here  we  use  the  fact  that 

(4.8)  R.{ 0,  s')  =  hRyyi 0,  s')  =  Wvi, 0,  s'). 

Since  Vy(0,  s’)  =  limfe  4  0  (1  /h)  V (h,  s'),  each  of  the  above  integrands  is  everywhere 
minimal  and  the  conclusion  follows. 

It  remains  to  make  sure  whether  we  have  specified  enough  conditions  to  de¬ 
termine  the  function  V(y,  s ).  In  general,  the  properties  (4.1)-(4.5)  are  not 
sufficient,  because  (4.3)  does  not  fully  represent  the  optimality  of  the  policy. 
We  shall  impose  two  further  optimality  conditions  in  order  to  ensure  that  there 
is  at  most  one  solution.  It  was  remarked  earlier  that  the  optimal  policy  can  be 
defined  by  the  inequality 

(4.9)  V  <  D,  (0  <  y  <  y(s)). 

Again,  for  any  position  ( y ,  s)  and  0  <  5  <  s,  consider  the  suboptimal  modified 
procedure:  continue  for  a  period  of  length  8  and  then  resume  the  optimal  policy. 
In  particular,  for  points  ( y ,  s)  in  the  optimal  stopping  region,  we  have 

(4.10)  E[V(y  +  8Y,  s  —  8 )]  >  D(s),  (y  >  y(s)). 

Strictly  speaking,  the  path  should  terminate  if  it  crosses  the  s-axis,  but  we  can 
alternatively  and  equivalently  treat  the  integrand  as  an  odd  function  of  y  +  8Y. 
The  boundary  condition  (4.3)  is  a  convenient,  but  very  special  form  of  (4.10). 
The  general  condition  is  needed  in  order  to  complete  the  characterization  of 
y(y,s). 

We  have  regarded  the  function  V {y,  s)  as  the  minimum  risk  for  each  separate 
position.  But  it  has  been  assumed  that  there  is  a  single  policy  which  attains  this 
minimum  everywhere  and  determines  V (y,  s ).  More  important;  we  have  assumed 
in  deriving  its  formal  properties  that  V ( y ,  s)  is  suitably  differentiable.  In  order 
to  justify  the  approach,  let  us  distinguish  temporarily  between  the  extremal  and 
the  formal  properties  of  V(y,  s). 

In  what  follows,  let  V(y,  s )  denote  a  risk  function  which  satisfies  the  formal 
conditions  (4.1)-(4.5),  (4.9)-(4.10),  assuming  that  such  a  solution  exists.  Let 
V*(y,  s)  be  the  infimum  over  all  control  procedures,  of  the  risk  at  the  point  (y,  s ). 


: 


?Vy(0,  s')  ds'  +  R(0,  0). 
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Thus,  given  any  position  ( y0 ,  s0 )  and  any  «o  >  0,  there  exists  a  policy  with  risk 
function  V (0)  (y,  s )  such  that 

(4.11)  V*(yo,  so)  <  V*>(yo,  s0)  <  V*(y0,  s0)  +  60. 

We  now  prove  that  V(y,  s)  —  V*(y,  s ),  which  means  that  the  minimization  prob¬ 
lem  will  be  properly  treated  provided  that  we  can  find  the  formal  solution. 
Although  the  existence  of  V (y,  s )  is  in  general  open  to  question,  it  can  sometimes 
be  found  explicitly,  and  we  shall  rely  on  this  later  for  certain  special  cases  con¬ 
nected  with  our  main  application.  The  equivalence  of  the  definitions  can  be 
established  from  the  fact  that  V(y ,  0)  =  V*(y,  0),  but  since  the  cost  function 
D(s) :  continuous  for  s  >  0,  may  be  unbounded  as  s  — >  0,  it  is  convenient  to  as¬ 
sume  further  that 

(4.12)  sup  [V(y,  s)  —  V*(y,  s)]  — >0  as  s  —>  0. 

y 

In  practice,  this  is  not  difficult  to  verify,  by  finding  a  crude  approximation  to 
V*(y,  s ).  The  following  argument  is  based  essentially  on  that  given  in  [2], 
section  6. 

Lemma  4.1.  If  V(y,  s )  is  a  risk  function  which  satisfies  conditions  (4.1)-(4.5), 
(4.9)-(4.10)  and  (4.12),  and  if  V*(y,  s)  is  the  minimum  risk  function,  then 

V(y,  s )  =  V*(y,  s),  (y,  s  >  0). 

Proof.  Consider  any  fixed  position  ( y0 ,  s0)  with  y0,  s0  >  0  and  let  e0,  ei,  c2  be 
arbitrary  positive  numbers.  By  (4.11),  we  can  find  a  policy  such  that  its  risk 
function  satisfies 

(4.13)  F«%„,  So)  <  V*(y0,  s0)  +  e0. 

Now  choose  Si  <  s0,  using  assumption  (4.12),  so  that  for  every  y  >  0, 

(4.14)  V{y ,  si)  <  V*(y,  Sl)  +  Cl  <  F«%,  si)  +  Cl. 

The  function  D(s )  is  uniformly  continuous  on  the  closed  interval  [si,  s0].  Hence 
there  is  a  8  =  (s0  —  Si)/n  for  some  integer  n  >  0,  which  ensures  that 

sup  I D(s)  —  D(s')\  <  €2, 

|s— «'|  <J 

within  the  interval. 

We  now  restrict  attention  to  the  period  So  >  s  >  Si  and  consider  two  pro¬ 
cedures  in  which  stopping  is  permitted  only  at  the  instants  80  =  8!  +  nb, 
Si  +  (n  —  1)5,  •  •  •  ,  Si.  Automatic  stops  on  the  s-axis  can  be  included  in  this 
restriction  by  extending  the  appropriate  risks  as  odd  functions  of  y.  Let  Vs(y,  s) 
represent  the  optimal  discrete  procedure  determined  from  the  final  cost 
V$(y,  si)  =  V(y,  Si).  Similarly,  let  V^iy,  s )  be  the  minimum  risk  when 
Vs0)(y,  Si)  =  V(0)(y,  si).  Related  to  these  is  the  function  V{+(y,  s)  defined  as 
follows.  We  make  a  slight  modification  of  the  stopping  cost  D(s),  but  not  the 
continuous  policy  associated  with  V{0)(y,  s).  Whenever  any  optional  stop  occurs, 
the  cost  is  determined  according  to  the  next  discrete  instant.  Thus  Z>(s)  is  re¬ 
placed  by  D(s'),  where  s'  =  sx  +  kb  for  some  integer  k  and  s  >  s'  >  s  —  b.  It 
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follows  from  our  choice  of  5  that  the  extra  cost  can  never  exceed  e2,  and  hence 

(4.15)  Vf(y0,s0)  <  F<°%o,  *>)  +  ft. 

On  the  other  hand,  (y,  s)  can  be  regarded  as  the  risk  function  for  a  certain 
discrete  procedure.  The  same  continuous  procedure  as  before  can  be  applied, 
with  the  provision  that  stopping  actually  takes  place  always  at  the  next  discrete 
instant.  This  means  that  part  of  the  path  will  be  disregarded.  Since  Vs0)(y,  s) 
represents  the  minimum  risk  for  the  restriction  to  discrete  time,  we  have 

(4.16)  Vj?»(y o,So)  <  F(+0)(2/o,So). 

The  relation  between  Vs0)(y,  s )  and  V&(y,  s)  is  a  consequence  of  our  choice  of 
Si.  We  have  V»(y,  Si)  <  Vf\y,  Si)  +  ex,  (y  >  0).  Then  Vs(y,  sx  +  8)  and 
Vs0)(y,  Si  +  8)  can  be  evaluated  in  terms  of  these  quantities  and  the  inequality 
is  preserved.  After  n  repetitions  of  this  technique,  we  obtain 

(4.17)  Vs(y0,  so)  <  Ff  0/o,  so)  + 

Finally,  we  must  make  use  of  the  properties  of  F (y,  s )  to  show  that 

(4.18)  V(y0,  s0)  <  Vs(yo,  s0). 

Assume  inductively  that  for  some  k  >  0,  F (y,  Si  +  k8)  <  V$(y,  Si  +  k8 ),  (y  >  0). 
When  k  =  0,  we  have  equality.  In  order  to  extend  the  result  to  the  case  ( k  +  1), 
we  note  first  that 

(4.19)  Vg(y,  si  +  (k  +  1)5) 

=  min  [D(si  +  (k  +  1)5),  E{Vs(y  +  8Y,  si  +  k8)}]. 

If  y  >  y(si  +  (k  +  1)5),  then 

(4.20)  E{Vs(y  +  5F,  *  +  A:5)} 

>  E{V(y  +  5F,  si  +  k8)}  >  D(s i  +  (k  +  1)5) 

by  (4.10)  and  hence  V(y,  Si  +  {k  +  1)5)  <  Vs(y,  Si  +  (k  +  1)5).  A  similar 
argument  shows  that 

(4.21)  V(y',  s')  <  Vs(y',  s'),  (y'  =  y(s');  sx  +  (fc  +  1)5  >  s'  >  si  +  k8). 

To  obtain  the  corresponding  inequality  for  positions  ( y ,  sx  +  (k  +  1)5)  with 
0  <  y  <  y($ i  +  (/c  +  1)5),  it  is  enough  to  show  that 

V(y,  Si  -f-  (k  +  1)5)  <  E{Vs(y  +  5F,  Si  +  £5)}, 

since  condition  (4.9)  applies.  The  right-hand  side  can  be  evaluated  as  a  con¬ 
ditional  expectation  as  follows:!?  {Vs  (y  +  8Y,  si  +  k8 )}  =  E{V$(Y'f  S')},  where 
(F',  S')  is  the  point  where  the  Wiener  process  through  ( y ,  Si  +  (k  +  1)5)  first 
hits  the  barrier  consisting  of  the  curve  y(s),  the  line  s  =  Si  +  k8,  and  the  s-axis. 
But  V(y,  s)  is  a  solution  of  the  diffusion  equation  in  the  region  lying  between 
these  curves  and  is  sufficiently  well  behaved  (see  [2],  section  6)  to  justify  a  similar 
expression,  V(y ,  Si  +  (k  +  1)5)  =  E{V(Y',  £')}.  Then  the  required  inequality 
is  valid  if  V(y',  s')  <  Vs(y',  s')  at  every  point  of  the  barrier.  But  this  has  already 
been  verified  for  the  case  y'  =  y(s'),  and  it  certainly  holds  along  the  two  linear 
sections.  The  induction  is  now  complete  and  (4.18)  follows. 
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We  conclude  from  the  inequalities  (4.13),  (4.15)-(4.18)  that 
(4.23)  V(yo>  so)  <  V*(yo,  So )  +  «o  +  ci  + 

Then,  since  e0,  eh  e2  were  chosen  arbitrarily,  we  obtain  V(y0,  s0)  <  V*(y0)  s0 ),  and 
this  can  only  mean  equality. 

5.  Approximations  to  the  optimal  boundary 

In  practice,  we  hope  that  conditions  (4.1)-(4.5)  and  (4.9)  will  be  sufficient  to 
determine  V(y,  s)  and  y(s)  from  the  given  cost  functions  D(s)  and  Rv(y,  0). 
However,  the  methods  used  here  to  find  approximations  to  the  curve  y(s )  involve 
a  reversal  of  the  natural  direction  of  inference.  We  shall  consider  various  solu¬ 
tions  of  the  diffusion  equation  and  investigate  the  variations  of  our  minimization 
problem,  determined  from  them  in  such  a  way  that  the  appropriate  properties 
hold  by  definition.  The  cost  functions  of  these  variations  might  seem  irrelevant, 
but  it  is  possible  to  arrange  useful  comparisons  with  the  given  D(s )  and  Rv(y,  0), 
so  that  a  relation  between  the  artificial  policy  and  y(s)  can  be  inferred.  In  this 
connection,  it  is  advantageous  to  think  in  terms  of  V  (y,  s )  rather  than  the 
original  risk  R{y,  s ). 

We  confine  our  application  of  the  above  techniques  to  the  case  when 

(5.1)  D(s)  =  Rv(y,  0)  =  y. 

It  is  interesting  that  because  Ryv(y,  0)  =  1  here,  the  function  Ryv(y,  s)  has  a 
special  interpretation.  Conditions  (4.1)  and  (4.3)  suggest  that  for  \y\  <  y(s), 
this  second  derivative  represents  the  probability  that  the  Wiener  process  escapes 
to  the  y- axis  without  hitting  either  boundary  curve.  Thus,  we  are  attempting  to 
select  y(s )  so  as  to  minimize  the  escape  probability  for  every  initial  position, 
subject  to  an  integral  constraint  given  by  (4.2).  Much  of  the  analysis  which 
follows  could  also  be  developed  from  this  point  of  view. 

The  solutions  of  the  diffusion  equation  discussed  here  are  all  generated  by  the 
relation 

(5.2)  7<«>(y,  s)  =  E[V^(Y( 0),  0)|K(s)  =  y],  (y,  a  >  0), 

with  suitably  selected  functions  V{a)(y,  0).  The  obvious  choice  indicated  by  (5.1), 
leads  to  V(1)(y,  s )  =  y.  This  satisfies  equation  (4.1)  trivially. 

If  we  now  set  yw(s)  =  1/s  and  make  the  modification  Vw(y,  s)  =  1/s  when 
y  >  1/s,  then  conditions  (4.2),  (4.4),  and  (4.5)  are  satisfied.  It  can  be  shown  that 
Va)(y,  s)  is  uniquely  determined  by  these  four  properties,  given  the  boundary 
curve  yw(s),  (see  [1],  section.  4]).  Then  we  may  conclude  that  Vw(y,  s)  is  the  risk 
function  which  corresponds  to  this  boundary.  Condition  (4.3)  does  not  hold,  so 
the  policy  is  suboptimal.  On  the  other  hand,  (4.9)  is  satisfied,  and  whenever 
y  <  1/s,  we  have  V(y,  s)  <  Vw(y,  s)  <  D(s).  It  follows  that 

y(s)  >  yw(s)  = 

o 


(5.3) 
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We  remark  that  the  same  risk  function  V(1)(y,  s)  also  represents  a  quite  differ¬ 
ent  discrete  procedure.  This  other  policy  is  optimal  when  it  must  be  decided  to 
stop  immediately  at  ( y ,  s),  or  not  at  all  until  one  of  the  axes  is  reached. 

We  now  consider  another  solution  of  the  diffusion  equation,  given  by 

(5.4)  V(2)(y,  s)  =  A<p(ys~ll2)ys~312  where  <p(u)  =  (27r)_1/2e“,/2. 

Let  the  constant  A  =  1/V(1),  so  that  F(2)(?/,  s)  =  s-1  along  the  curve 
y(2)(s)  =  s1/2.  Further,  we  have 

(5.5)  y(v\y,  s)  =  A<p(ys~ll2)s~zl2(  1  —  y 2s~l), 

which  vanishes  when  y  =  s1/2.  Hence,  if  we  redefine  the  risk  outside  this  curve 
by  V(2)(y,  s)  =  s_1,  conditions  (4.1)-(4.4)  and  (4.9)  are  all  satisfied.  The  pro¬ 
cedure  specified  by  the  curve  #(2)(s)  is  not  optimal  for  the  cost  function 
Ry(y,  0)  =  y.  However,  it  is  a  useful  policy  whenever  s  >  0,  since  every  path 
will  be  stopped  before  it  reaches  the  y- axis.  The  corresponding  risk  function 
satisfies  (4.9)  and  we  can  obtain  an  inner  approximation  just  as  before: 

(5.6)  y(s)  >  yw(s)  =  s1/2. 

Finally,  we  note  that  the  special  policy  determined  by  y{2)(s)  would  be  optimal 
if  the  terminal  cost  Rv(y,  0)  had  been  infinite  for  all  y  <  0.  This  reinforces  the 
remarks  made  at  the  end  of  section  2. 

The  inequalities  (5.3)  and  (5.6)  together  provide  a  fairly  accurate  inner  ap¬ 
proximation  to  the  whole  curve  y(s).  But  in  order  to  show  this,  we  must  find 
suitable  outer  approximations.  Here,  the  investigation  of  special  suboptimal 
policies  is  no  longer  enough.  Roughly  speaking,  we  need  to  consider  procedures 
which  are  optimal  in  situations  where  the  decision  maker  is  encouraged  to  con¬ 
tinue  by  a  reduction  of  future  costs. 

The  last  example  can  be  modified  to  produce  such  a  procedure.  Let 

(5.7)  V™(y,  s )  =  A<p(y(s  +  h)-"2)y(s  +  h)~312,  (y,  s  >  0), 

where  A,  h  >  0  are  parameters  of  the  solution.  For  any  fixed  value  of  h,  let  us 
choose  A  in  such  a  way  that  V(3)(y,  0)  <  y,  whenever  y  >  0.  Since 

(5.8)  s )  =  A<p(y(s  +  h)~^2)(s  +  h)-*'2{  1  -  y2(s  +  h)~ »}, 

which  does  not  exceed  A<p(0)/h312  along  the  line  s  =  0,  this  can  be  achieved  by 
setting  A  =  h3l2/<p(0). 

Consider  the  procedure  determined  by  the  curve  yn{s)  =  (s  +  h)1/2;  s  >  0.  If 
we  imagine  that  the  cost  functions  (5.1)  are  replaced  by 

(5.9)  F(3)((s  +  hy<2,  s )  =  A<p(l)(s  +  h)~\  V™(y,  0)  =  A<p(yh-ll2)yh~312 

respectively,  then  we  have  an  optimal  policy  by  reference  to  the  lemma  of  section 
4.  The  risk  function  Vw(y,  s )  can  be  modified  according  to  (4.2)  outside  the 
continuation  region,  and  then  all  the  conditions  (4.1)-(4.5),  (4.9)  and  (4.12)  are 
satisfied. 

It  only  remains  to  compare  this  auxiliary  problem  with  the  original.  We  ob- 
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serve  that  the  stopping  cost  A<p(l)/(s  +  h)  <  1/s,  provided  that  0  <  s  <  sh, 
with  equality  at  the  instant  s*  =  h/(A<p(  1)  —  1). 

Now  consider  the  boundary  position  (yh,  sh)  for  the  optimal  policy  indexed  by 
h.  According  to  this  constructed  policy,  the  minimum  risk  attainable  under  the 
specifications  (5.9)  is  Vi3)(yh,  Sh)  —  1/s*.  But  the  actual  cost  incurred  by  stopping 
at  any  future  time  s:  s*  >  s  >  0,  or  at  s  =  0,  is  less  than  that  prescribed  by  (5.1). 
It  follows  that  V(yh,  Sh )  >  Vi3)(yh,  Sh )  =  D(sh),  and  hence  yh  >  y(sh)-  Thus,  as  h 
varies,  the  point  (yh,  Sh )  describes  a  curve  y(3)(s)  say,  which  is  an  outer  approxi¬ 
mation  to  the  required  optimal  boundary.  The  form  of  #(3)(s)  is  implicit  in  our 
special  choice  of  A,  yh(s )  and  Sh,: 


(5.10) 


s  =  ell2h(h31 2  -  e1/2)_r 
y  =  (s  +  h )1/2 


(h  >  e1/3). 


It  is  easily  verified  that  s  increases  through  every  positive  value  as  h  decreases 
from  oo  to  e113.  In  particular,  the  following  asymptotic  formulae  can  be  deduced. 

(5.11)  y(s)  <  y^(s)  =  s1/2  -^1  +  W 3s~1  +  0(s~2)j,  (s -><»), 

(5.12)  y(s )  <  y(3)(s)  =  e1/2s-1{l  +  0(s3)} ,  (s  — »0). 

The  first  of  these,  taken  with  (5.6),  shows  that 

(5.13)  y(s)  =  s"2{l  +  0(s-1)},  (s  oo). 

However,  the  second  does  not  match  so  well  with  (5.3).  Another  outer  approxi¬ 
mation  will  be  constructed  to  give  a  more  precise  description  of  the  optimal 
boundary  when  s  is  small. 

The  following  solution  of  the  diffusion  equation  generates  a  useful  auxiliary 
problem. 

(5.14)  Vw(y,  s)  =  y  —  Be^s/2  sinh  (fiy),  (y,  s  >  0). 

We  aim  to  choose  the  parameters  B,  P  >  0  for  a  particular  instant  s  and  by 

making  the  proper  comparison,  find  the  level  yw(s )  >  y(s),  which  provides  a 
best  local  approximation.  But  it  is  more  convenient  to  study  the  auxiliary 
problem  first  in  its  general  form  and  then  try  to  pick  out  a  special  position  where 
the  comparison  is  most  relevant.  We  note  that 

(5.15)  y’vKy,  s)  =  1  —  BpePs/2  cosh  (fiy), 

(5.16)  Vvv(y,  s)  =  —B^2e^s/2  sinh  (fiy). 


The  optimal  boundary  for  the  auxiliary  problem  is  determined  simply  by  setting 
F£4) (y,  s )  =  0,  so  that  Bf 3  cosh  (/3 y)  =  e~ 028/2 . 

Provided  that  B  is  not  too  large,  the  corresponding  continuation  region  is 
bounded.  Let  us  denote  the  boundary  curve  by  yp(s)  and  restrict  attention  to 
positions  (y,  s )  with  yp(s)  >  0.  As  before,  the  risk  outside  the  continuation  region 
must  be  determined  by  applying  condition  (4.2)  with  D  replaced  by  an  appropri¬ 
ate  D(4)(s).  In  spite  of  this,  the  relation  Vw(y,  0)  <  y  remains  valid,  and  it  is 
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enough  to  examine  the  auxiliary  stopping  cost  along  the  boundary  with  reference 
to  (5.1). 

Consider  the  difference  F(4)(^(s),s)  —  s_1,  as  s  increases  from  zero  as  far  as  the 
instant  when  yp(s)  =  0.  The  quantity  is  negative  at  both  ends  of  the  curve,  but 
it  is  clear  that  we  can  adjust  its  maximum  value  by  choosing  B.  Suppose  now 
that  B  is  selected  to  make  this  maximum  value  zero.  A  necessary  condition  is 
that  the  differential  along  the  curve  should  vanish,  and  by  applying  the  diffusion 
equation,  it  can  be  expressed  as 

(5.17)  0  =  n4>(&M,  *)  dy  +  {F«*(fc,(o),  s)  +  8-*}  ds, 

(5. 18)  0  =  |i  F«> (&(«),  s)  +  <r!}  ds. 

Let  (yp,  sp)  be  a  point  on  the  curve,  at  which  the  maximum  is  attained.  For 
this  position,  the  future  costs  associated  with  the  auxiliary  problem  are  uniformly 
less  than  those  given  by  (5.1),  but  the  present  stopping  cost  is  the  same.  Conse¬ 
quently  we  have  yp  >  y(sp). 

The  above  construction  implies  that  the  following  equations  must  hold  simul¬ 
taneously  at  the  special  position  (y,  s )  =  (yp,  sp) : 

(5.19)  Vw(y,  s)  =  s~\  V?\y,  s )  =  0,  V$(y,  s)  =  -2s~\ 

Fortunately,  these  three  properties  are  sufficient  to  define  the  construction.  In 
fact,  there  is  no  difficulty  in  eliminating  the  parameters  B  and  /3.  We  have 

f  BeP*1 2  sinh  (fiy)  =  y  —  s~l, 

(5.20)  i  B$eM  cosh  (0y)  =  1, 

\J3(32eP*/2  sinh  (fiy)  =  —2s-2. 

The  elimination  leads  to  a  relation  between  y  and  s,  which  defines  the  required 
outer  approximation  y(i)(s)  for  all  values  of  s: 

(5.21)  2ll2s~1(y  -  s-1)1/2  -  tanh  {2l'2ys~1(y  -  s-1)-1/2}  =  0. 

It  is  a  straightforward  matter  to  verify  that  the  expression  on  the  left  is  a 
strictly  increasing  function  of  y  in  the  range  y  >  s~l  and  deduce  the  existence  of 
a  unique  zero  at  y  =  y(i)(s). 

The  effectiveness  of  the  approximation  is  suggested  by  simpler  formulae  which 
can  be  derived  from  (5.21)  for  extreme  values  of  s.  A  limited  expansion  of  the 
hyperbolic  tangent  can  be  used  to  show  that 

(5.22)  yw(s)  =  (f)1/2  s1/2{l  +  CKs-3'2)},  (s  ->  »). 

In  this  case,  according  to  (5.13),  the  approximation  is  too  large  by  a  factor  (f)1/2. 
But  for  small  values  of  s,  the  result  is  much  more  satisfactory.  Since  tanh  (u)  <  1 
always,  equation  (5.21)  yields  the  inequality  yw(s)  <  1/s  +  |s2.  Then  by  mak¬ 
ing  use  of  yw(s)>  we  obtain 

j  <  m  <  ^  +  to*, 


(5.23) 


(s  >  0). 


198  FIFTH  BERKELEY  SYMPOSIUM:  BATHER  AND  CHERNOFF 

In  particular,  the  difference  between  these  two  bounds  approaches  zero  rapidly 
as  s  — >  0.  The  above  inequality  for  y(i)(s)  leads  to  a  more  robust  formula  by 
substituting  for  y  in  the  second  term  of  (5.21) : 

(5.24)  #(4)(s)  ~  -  +  5S2  tanh2  {1  +  2s-3}, 

s 

which  is  fairly  accurate  when  s  <  1. 

The  table  below  contains  a  summary  of  our  results  so  far,  giving  each  of  the 
four  approximations  for  several  values  of  s.  The  first  two  functions  tabulated  are 
lower  bounds  and  the  others  are  upper  bounds. 


TABLE  I 


4.0 

6.0 

8.0 

10.0 

0.25 

0.17 

0.13 

2.00 

2.45 

2.83 

2.40 

2.73 

3.07 

3.38 

2.48 

3.02 

3.48 

3.88 

Bounds  on  y(s) 


6.  An  auxiliary  problem 

The  inner  and  outer  bounds  on  y(s )  obtained  in  the  last  section,  leave  the  un¬ 
known  boundary  curve  covered  by  a  narrow  strip.  The  table  indicates  that  we 
already  have  a  reasonably  accurate  determination  of  the  optimal  policy,  for  all 
values  of  s  >  0.  Nevertheless,  it  is  of  interest  to  investigate  whether  the  tech¬ 
niques  can  be  developed  further.  In  what  follows,  we  seek  more  precise  asymp¬ 
totic  descriptions  of  y(s),  first  as  s  — >  0  and  later  as  s  — > 

It  will  be  convenient  to  change  the  notation  slightly  and  denote  the  given 
stopping  and  terminal  costs  together  by 


D(y,  s)  =  -> 


(y  >  0,  s  >  0), 


D(y,  0)  =  y,  (y>  0), 

D(0,  s)  =0,  (s  >  0). 

Consider  the  auxiliary  stopping  problem  for  the  Wiener  process  {Y (s),  s  >  0} 
in  the  half  plane  s  >  0,  with  the  following  stopping  and  terminal  cost  function: 

d(y,s)  =  —s,  (s  >  0), 

d(y,  0)  =  0,  (y  >  0), 

d{y,  0)  =  y,  (y  <  0). 


(6.2) 
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Lemma  6.1.  .As  s  — *  « ,  the  minimum  risk 
(6.3)  v(y,  s)  =  y  -  \e2y+2*~x  -f  0(1), 

where  o(l)  is  positive  and  applies  uniformly  in  y.  The  optimal  policy  consists  of 
stopping  whenever  y  >  z(s),  where  —  s  +  \  +  o(l)  <  z{s)  <  —  s  +  h- 

Proof.  We  note  that  v'(y,  s)  =  y  —  \e2y+2s~l  satisfies  the  diffusion  equation 
with  boundary  conditions 


v'(y,  s)  =  -s, 

for 

y  = 

—  S  + 

s  >  0, 

Vy{y,  s)  =  0, 

for 

V  = 

- s  + 

s  >  0, 

v'(y,  0)  =  y  -  \e2y~\ 

for 

y  < 

& 

s  =  0. 

Thus  v'(y,  s )  and  z'(s)  =  —  s  +  represent  the  solution  of  a  more  favorable 
problem,  since  v'(y ,  0)  <  d(y,  0),  where  it  applies.  Hence,  v(y,  s )  >  v'(y,  s )  and 
(y,  s )  is  a  stopping  position  if  y  >  —  s  + 

The  procedure  defined  by  z'(s)  is  suboptimal  for  the  original  problem.  For  that 
problem  it  yields  a  risk 

(6.5)  v"(y,  s)  =  v'(y,  s)  +  E'  +  min  (_  F(0),  0)|F(s)  =  y}, 

where  E'  represents  the  expectation  restricted  to  those  paths  which  are  not 
stopped  until  s  =  0.  It  is  easily  shown  that  this  term  approaches  zero  uniformly 
in  y  as  s  — >  «>.  Then  (6.3)  follows,  since  v(y,  s )  <  v"(y,  s ).  It  is  also  clear  that  for 
any  e  >  0,  when  s  is  sufficiently  large,  y  <  — s  +  f  —  c  implies  that  v(y,  s)  <  —s 
and  (y,  s )  is  a  continuation  point. 

It  remains  to  show  that  the  optimal  stopping  set  S  consists  of  an  interval 
[z($),  oo  ]  for  each  s.  But  if  (y,  s)  is  in  the  continuation  region,  we  can  easily  show 
that  v(y  —  A,  s)  <  v(y,  s)  <  —s  from  a  consideration  of  the  policy  obtained  by 
translating  S  downwards  an  amount  A.  The  resulting  procedure  is  suboptimal, 
but  since  d(y,  0)  is  monotone  in  y,  it  leads  to  the  desired  inequality. 

We  remark  that  the  lemma  was  prompted  by  the  fact  that  a  standard  Wiener 
process  { W(t );  t  >  0}  starting  at  the  origin,  intersects  the  line  w  =  a  +  mt; 
a  >  0,  m  >  0,  with  probability  e~2am. 

In  section  5,  we  showed  that  the  boundary  y(s)  of  the  optimal  stopping  region 
for  the  spaceship  control  problem  specified  by  (6.1),  satisfies  y(s)  <  s_1  +  |s2  in 
general.  We  can  now  prove  the  following  theorem. 

Theorem  6.2.  If  (s  — >0),  then  y(s )  =  s_1  +  £s2  +  o(s2). 

Proof.  If  no  stopping  is  permitted  when  0  <  s  <  s0,  the  problem  becomes 
one  with  minimum  risk  V*(y,  s )  >  V(y,  s ),  where  V*(y,  s0)  =  min  ( y ,  s^1)  for 
y  >  0.  Again,  if  (F(s)}  is  a  Wiener  process  in  the  (— s)  scale,  then  {aY (s)}  is  a 
Wiener  process  in  the  (— a2s)  scale.  Thus,  the  above  constrained  problem  may 
be  transformed  by  setting 
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y*  =  s0  2(y  -  s0 1), 

(6.6)  s*  =  s0"4(s  -  s0), 

v*  =  s0-2(F  -  so"1), 

to  the  problem  with  stopping  and  terminal  risk 

d*(y*,  s*)  =  s0_2{(so  +  sts*)-1  -  so-1} 

=  —  s*  +  SoS*2  —  •  •  •  ,  (y*  >  —So3,  s*  >  0), 

(6.7)  d*(y*,  0)  =  min  ( y *,  0),  ( y *  >  -s0-3), 

d*(—So3,  s *)  =  -s0-3,  (s*  >  0). 

This  is  approximately  the  auxiliary  problem  of  lemma  6.1,  when  s0  is  small. 

Consider  the  procedure  with  risk  v*"(y*,  s*),  which  consists  of  stopping  when¬ 
ever  y*  >  —  s*  +  By  comparing  (6.7)  with  (6.2),  the  difference  v*''(y*,  s *)  — 
v"(y*,  s *)  can  be  expressed  as  a  sum  of  two  contributions.  One  is  due  to  paths 
which  stop  along  y*  =  —  s*  +  \  for  some  s*  >  0,  and  the  other  is  due  to  paths 
which  stop  along  y*  =  —  ScT3.  We  take  the  initial  s*  =  So1.  Clearly, 

(6.8)  sup  | d*  (—s'  +  s')  -  d  (-s'  +  |,  s')|  =  O(s0), 

0  <s'  <so_1 

(6.9)  sup  |d*(— s<T3,  s')  —  t>"(—  so3,  s') |  =  o(l). 

0  <«'  <8o-1 

It  follows  that 

(6.10)  v*"(y*,  so1)  =  v"(y*,  s0_1)  +  o(l)  =  v'(y*,  s0_1)  +  o(l), 
and  the  optimal  risk  for  (6.7), 

(6.11)  v*(y*,  s0-1)  <  v*"(y*,  s0_1)  <  d*(y*,  s0_1), 

when  y*  <  —  s^-1  +  \  —  e  and  s0  is  sufficiently  small.  Then,  on  substituting  back 
into  (6.4),  with  s  =  s0  +  SoS*  =  So  +  s3,  we  find  that  V(y,  s)  <  V*(y,  s)  <  s-1, 
whenever 

(6.12)  y  <  So 1  +  So  (—s0_1  +  \  —  e)  =  s_1  +  (I  —  e)  s2  +  o(s2). 

In  other  words, 

(6.13)  y(s)  >  s-1  +  |s2  +  o(s2),  (s  0), 

which  concludes  the  proof. 

The  above  derivation  provides  a  close  upper  bound  for  the  optimal  risk  V  ( y ,  s), 
near  the  boundary.  Since  s*  =  s<T 1  corresponds  to  s  =  s0  +  So,  this  indicates  that 
for  small  values  of  s0,  the  effect  of  forbidding  any  stops  between  0  and  s0,  is  small 
and  diminishes  rapidly  as  s  increases  from  s0. 

7.  Formal  expansions  for  small  s 

In  this  section  we  derive  the  formal  expansions  for  s  — >  0; 

(7.1)  y(s)  =  s-1  +  ^2  -  is5  +  Is*  +  •  •  •  , 
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(7.2)  V(y,  s)  =  y  —  s2e2S'  +  s3(5'  —  8'2) 

+  s6  (i  -  78'  +  6S'2  -  35'3  +  S'4)  +  ■  • 
where  8'  =  s~2(;y  —  s_1)  —  and  y  <  y(s). 

The  expansions  are  motivated  by  the  preceding  results,  which  indicated  the 
major  importance  of  the  boundary  y  =  —  s  +  \  for  large  s  in  the  auxiliary 
problem  (6.2).  This  in  turn  points  to  the  relevance  of  the  distribution  and  mo¬ 
ments  of  the  time  T  when  the  Wiener  process  {W(t),  t  >  0},  starting  at  the  origin, 
first  intersects  the  line  w  =  a  +  mt;  a  >  0,  m  >  0.  The  distribution  is  known  to 
have  moment  generating  function  E{eKT]  =  exp  {—a(m  +  Vm2  —  2X)}.  Thus, 
in  the  notation  of  section  6,  the  moments  of  the  time  required  for  a  path  from 
( y *,  s*)  to  hit  the  line  L  :  y*  =  —  s*  +  can  be  expressed  in  the  form  enP(8), 
where  P(8)  is  a  polynomial  in  8  and  where 

(7.3)  8  =  —  a  =  y*  +  s*  —  \- 

One  is  led  to  consider  solutions  of  the  diffusion  equation  \Hy*y*  =  Hs*,  of  the 
form  Hn(y*,  s *)  =  e2Sun(8  +  s*,  s*),  where  the  functions  un(x,  t )  are  polynomial 
solutions  of  the  corresponding  equation  \uxx  =  ut: 

u0(x,  t)  =  1, 

Ui(x,  t)  =  X , 

(7.4)  u2(x,  t )  =  %{x2  +  t), 
u3(x,  t )  =  }l(x3  +  3 xt), 

Ui(x,  t )  =  ! (x 4  +  6xH  -f-  St2), 

and  so  on.  We  replace  these  for  convenience  by  the  linear  combinations  wn(x,  t), 
selected  so  that  wn(t,  t)  =  tn.  Hence, 

w0(x,  t)  =  w0 Or,  t)  —  1, 

W\{x,  t)  =  U\{x,  t)  =  x, 

w2{x,  t )  =  2w2  —  wi  =  x2  -f  t  —  x, 

w3(x,  t)  =  6W3  —  3w’2  =  x3  +  3 xt  —  3a:2  —  St  +  3a:, 

Wi(x,  t)  =  24w4  —  6ir3  —  3u’2 

=  a:4  —  6a:3  +  15a:2  —  15a;  +  15i  —  18a;£  +  6a;2£  +  St2. 

Now  let 

(7.6)  Jn{8,  s*)  =  e2Swn(8  +  s*,  s*). 

Then  we  have 

Wo  (8  +  s*,  s*)  =  1, 

Wl(«  +  8*,  s*)  =  s*  + 8, 

w2(8  +  8*,  s*)  =  s*2  +  5 (2s*  -  1)  +  82, 

K*'IJ  wz{8  +  s*,  a*)  =  s*3  +  5 (3s* 2  -  3s*  +  3)  +  52(3s*  -  3)  +  53, 

Wi(8  +  s*,  s*)  =  s*4  +  S(4s*3  —  6s*2  +  12s*  —  15) 

+  82(6s*2  -  12s*  +  15)  +  5 3 (4s*  -  6)  -f  54; 
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(7.8) 

Jo  =  e25  =  x  +  25  +  2  52  +  |53  +  |«4  +  •  •  •  , 

Ji  =  s*  +  5 (2s*  +  1)  +  52(2s*  +  2)  +  53  (|s*  +  2)  +  54  (fs*  +  $)+•••, 

J2  =  s*2  +  5 (2s* 2  +  2s*  -  1)  +  52(2s*2  +  4s*  -  1)  +  53  (fs*2  +  4s*)  +  •  •  •  , 

Jz  =  s*3  +  5 (2s* 3  +  3s*2  -  3s*  +  3)  +  52(2s*3  +  6s*2  -  3s*  +  3)  +  •  •  •  , 

J*  =  s*4  +  5 (2s*4  +  4s*3  -  6s*2  +  12s*  -  15)  +  •  •  •  , 
and  so  on.  We  can  easily  expand  dJJdb  from  (7.8). 

For  the  problem  specified  by  (6.7),  we  shall  consider  solutions  of  the  following 
form,  for  the  optimal  risk  and  boundary  curve. 

(7.9)  v*(y*,  s*)  =  y*  —  \e2h  +  c0(s0)J0  +  Ci(s0)Ji  +  c2(s0)«/2  +  •  •  •  , 

V*(s*)  =  -s*  +  |  +  5(s*), 

5(s*)  =  Si(s*)so  +  a2(s*K  +  53(s*)s<)  +  •  •  •  . 

The  coefficients  c„(s0)  and  6„(s*)  can  be  selected  to  approximate  the  boundary 
conditions  v*(y*,  s*)  =  d*(y*,  s*)  and  Vy*(y*,  s*)  =  0.  We  have 

(7.11)  Z  Cn(s0)Jn(8,  8*)  =  SoS*2  —  SoS*3  +  SoS*4  —  •  •  • 

+  52  +  p3  +  p4  +  •  •  •  , 

f)  T 

(7.12)  E  c„(s0)  (5,  s*)  =  25  +  252  +  p3  +  •  •  •  . 

We  match  the  coefficients,  taking  s0  small  and  treating  s*  as  0(1).  The  dom¬ 
inant  term  on  the  right  of  (7.11)  is  sis*2,  which  calls  for  c2  =  si.  Then  the  left  side 
of  (7.12)  begins  with  si(2s*2  +  2s*  —  1),  and  the  right  side  with  si25i(s*),  which 
implies  that  5i(s*)  =  s*2  +  s*  —  |.  By  substituting  again  in  (7.11)  and  carrying 
terms  of  order  si,  we  find  that 

(7.13)  c4  =  —si,  Cz  =  —  3si,  Ci  =  si,  c0  =  —  1st. 

Next,  a  similar  comparison  for  (7.12)  shows  that 

(7.14)  52(s*)  =  -s*3  +  |s*2  -  fs*  +  f> 

and  so  on.  The  formal,  and  so  far  unjustified,  substitution  of  s*  =  0,  with  the 
transformation  (6.6),  gives 

(7.15)  s  =  so,  y(s)  =  So-1  +  sly*(0),  V(y,  s)  =  s0-1  +  s%v*(y*,  0), 

and  finally  yields  the  desired  terms  of  (7.1)  and  (7.2). 

The  formal  expansions  (7.9)  and  (7.10)  seem  to  be  justifiable  for  the  auxiliary 
problem,  when  s*  is  large.  Although  the  authors  have  not  carried  out  the  neces¬ 
sary  details  for  a  proof,  it  seems  to  be  fairly  straightforward.  We  have  treated 
s*  as  0(1),  tut  clearly  the  operations  would  be  meaningful  for  large  s*,  provided 
that  SoS*2  — >  0.  However,  it  cannot  be  expected  that  the  substitution  s*  =  0  will 
yield  the  solution  of  the  auxiliary  problem  (6.7).  In  fact,  this  substitution  does 
not  yield  values  for  v*(y*,  0)  which  coincide  with  d*(y*,  0)  =  min  ( y *,  0).  On  the 
other  hand,  if  the  expansions  (7.9)  and  (7.10)  are  meaningful,  they  should  be 
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valid  for  s*  =  So1  and  s*  =  2s<71  say.  Thus,  one  would  expect  the  expansions  to 
be  self-consistent  in  the  sense  that  different  pairs  (s0,  s*)  yield  approximately  the 
same  results,  whenever  the  corresponding  values  of  s  =  s0  +  So$*  coincide.  In¬ 
deed,  the  initial  terms  have  this  consistency  property  exactly,  for  all  s*  >  0. 
This  fact  indicates  that  there  is  no  effect  due  to  the  computationally  convenient 
substitution  s*  =  0  and  justifies  the  results  (7.1)  and  (7.2). 

The  above  consistency  provides  a  useful  check  on  the  calculations  of  5i(s*) 
and  52(s*).  For  example,  the  condition  that  y  =  So1  +  sly*  remains  constant 
when  s  =  s0  +  SoS*  is  constant,  can  be  seen  to  imply  that  <$i(s*)  =  s*2  +  s*  +  Kh 
and  given  Kx  =  —  §,  that  82(s*)  =  —  s*3  +  ^s*2  —  fs*  +  K2. 


8.  Refined  bounds  for  s  — >  » 


In  this  section,  we  shall  establish  the  following  upper  and  lower  bounds  for 
<S(s)  =  y(s)s~112  as  s  — ><» ,  to  improve  the  results  obtained  in  section  5. 

Theorem  8.1.  (i)  There  is  a  constant  K0  >  0,  such  that  a(s)  >  1  -f-  jK'0s-’,°, 

(s  — >  oo ),  where  rj0  =  1.61005. 

(ii)  If  v  <  Vo,  there  is  a  constant  Kv  >  0,  such  that  a(s )  <  1  +  Kvs~v,  (s  —>  oo). 
We  review  a  few  relevant  facts  before  proceeding  to  the  main  argument. 
Certain  important  solutions  of  the  diffusion  equation  have  the  form,  u(y,  s )  = 
s~xnA\(a),  a  =  ys~112,  where  Ax(a)  satisfies 

(8.1)  A\(a)  +  aAx(a)  +  X^4\(a)  =  0. 


We  observe  that  A  {(a)  is  a  candidate  for  ^  x+i(a).  One  example  for  X  =  2,  which 
will  be  useful,  is  a<p(a).  The  odd  solutions  of  (8.1)  are  of  special  interest.  A  power 
series  expansion  shows  that  these  can  be  expressed  in  terms  of  the  confluent 
hypergeometric  function  as 


(8.2) 

where 

(8.3) 


Ax(a)  =  aF 


/X+l 
\  2  ’ 


F(P,  y;w)  =  1  +  -  w  + 
7 


0Q8  +  1)  w2 

7(7  +  1)  2! 


+  ...  . 


With  this  definition,  the  function  Ax  (a)  is  continuous  in  X  and  the  smallest 
value  of  X  >  1,  for  which  Ax(l)  =  0  is  X0  =  5.22010.  We  note  also  that  Ax(a)  >  0 
for  1  <  X  <  Xo,  0  <  a  <  1,  and  Ax«(l)  <  0.  Finally,  Ax(a)  is  a  bounded  function 
of  a. 

The  proof  of  (i)  involves  considering  a  solution  of  the  diffusion  equation  of  the 
form 

(8.4)  B  js-i  -  i>sV»  /U(a))> 


which  corresponds  to  a  “less  favorable”  problem  than  ours.  As  a  preliminary,  we 
note  that  if  c  >  0  is  small  enough,  the  function 
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(5.5)  Goic,  a)  =  ^  -  cAu(a), 

which  is  equal  to  1  at  a  =  1,  attains  a  local  maximum  M0(c )  near  a  =  1.  In  fact, 
as  c  — >  0+,  M0(c)  ^  1  +  |c2A£(l).  This  maximum  is  increasing  in  c  for  small 
c  >  0  and  is  attained  at  a0(c)  ~  1  —  \cA[,{  1)  >  1. 

Now  take  any  s0  >  \  and  consider 

(8.6)  F0(</,  s)  =  {M<,(c<So  ■»)*}-'  -  cs-»4x.(a)j, 

where  y0  =  ^X0  —  1  =  1.61005.  If  c  >  0  is  sufficiently  small  (not  depending 
on  So),  it  is  easy  to  see  that  D0(y,  |)  =  V0(y,  |)  >  V(y,  |),  at  least  when 
0<y<  2_1/2o:o(cs6'’?0).  Since  Go(cs~vo,  1)  =  1  and  M0(cs~no)  is  decreasing  in  s 
for  ^  <  s  <  So,  the  equation  V0(y,  s )  =  s-1  has  a  solution  y0(s)  =  s1/2a0(s). 
Further,  a0(s)  always  lies  between  1  and  a0(cscT T?0),  and  «o(so)  coincides  with 
ttoCcso-”0)- 

Thus,  the  curve  y0(s)  determines  a  procedure  for  |  <  s  <  s0  with  risk  V0(y,  s) 
in  the  continuation  region,  for  the  problem  specified  by 

D0(y,  s)  =  s~\  (y  >  0,  \  <  s  <  s0), 

(8.7)  Z)0(O,  s)  =  0,  (|  <  s  <  so), 

Do  (y,  h)>V  (y,  §),  (0  <  y  <  y0  (*)). 

This  problem  is  less  favorable  than  the  original  and  furthermore,  V0(y,  s0)  <  So1 
when  y so 1/2  <  a0(s0)  «  1  —  icso^Ul).  This  establishes  (i)  and  also  shows  that 
V(y,  s0)  <  V0(y,  s0). 

We  shall  prove  (ii)  by  comparing  the  original  problem  with  one  for  which  the 
minimum  risk  is  given  by  a  solution  of  the  diffusion  equation 

(8.8)  V>.(y,  s)  =  B  {s-  ^  -  6S-«4„(a)}, 

with  X  =  2r?  -f-  2  <  Xo.  It  suffices  to  consider  y  close  to  y0,  and  hence,  we  may 
assume  that  d.x(l)  >  0,  Ax(l)  <  0.  For  small  values  of  c  >  0  and  a  near  1,  the 
function 

(8.9)  <?»(«,  a)  =  ^  -  c4,(a) 

rH  1  —  (a  —  l)2  —  cAx(l)  ~  c(a  —  l)Ai(l). 

Thus  G\(c,  a)  has  a  local  maximum  at 

(8.10)  ax(c)  «  1  -  *cAx(l)  >  1, 
and  the  maximum  value 

(8.11)  Mx(c)  «  1  -  cAx(l)  +  1), 

is  decreasing  in  c.  Since  Ax(a)  is  bounded,  c  can  be  kept  small  enough  to  ensure 
that  this  is  the  absolute  maximum  for  a  >  0. 
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We  now  choose  B  =  l/M\(bs0  v)  in  (8.6)  so  that 


(8.12)  Fxfo,  *)  =  a). 

For  each  s,  let  y\(s)  =  sll2a\(s),  where  ax(s)  =  axibs-11).  Thus,  provided  that  bs^ 
is  small,  V\{y,  s)  and  the  curve  yx(s)  satisfy  the  free  boundary  conditions  and 
represent  the  optimal  solution  of  a  stopping  problem.  If  we  restrict  our  attention 
to  {(y,  s) ,  y  >  0,  Si  <  s  <  So} ,  the  appropriate  stopping  and  terminal  cost  is 


(8.13) 


Dx(y,  s ) 


Mxjbs^) 
Mx(bsov ) 


s-1  <  an1, 


Dx{ 0,  s)  =  0, 

D\(y,  si)  =  Vx(y,  si), 


(y  >  0,  Si  <  s  <  So), 
(si  <  s  <  So), 

(0  <  y  <  y\{si)). 


Assuming  for  the  moment  that  V\(y,  si)  <  V(y,  Si),  this  problem  is  more  favor¬ 
able  than  the  original  one.  Then 

(8.14)  V(yx(s0),  so)  >  V\(yx(so),  s0)  =  s0_1. 

It  follows  that  ( yx(s0 ),  s0)  is  in  the  optimal  stopping  region  for  the  original  prob¬ 
lem,  which  gives  the  required  inequality  (ii). 

It  remains  to  verify  that  b  and  si  can  be  selected  so  that 

(8.15)  Vx(y,  si)  <  V(y,  si),  (0  <  y  <  s\/2*x(bsD), 


whenever  s0  is  sufficiently  large.  One  of  the  results  implicit  in  section  5  is  a  lower 
bound  on  the  minimum  risk:  V(3)(y,  s)  <  V(y,  s).  The  function  Vw{y,  s)  has  the 
following  property  as  s  — »  »  : 


(8.16) 


s7(3)(asi/2>  s) 


a~M {1 + o(l)}' 

1  +  0(1), 


(0  <  a  <  1), 

(a  >  1), 


where  o(l)  applies  uniformly  in  a  >  0.  We  now  observe  that  A\(a)^(l)/a^>(a) 
is  bounded  away  from  zero  in  some  fixed  interval  0  <  a  <  1  +  e.  Hence  there 
is  a  constant  k  >  0,  not  depending  on  the  value  of  c,  such  that 


(8.17) 

Gx(c,  a)  <  ^  (1  -  2 ck), 

(0  <  a  <  1  +  e). 

Thus, 

(8.18) 

.  v  (  .1/2  .  \  ^  oup(ct)  (1  -  2 bspk) 

SiVxiW  ,si)  <  ^(1)  > 

(0  <  a  <  1  +  c). 

We  can  now  select  6  and  «i  by  first  choosing  a  value  for  c  =  bsrv  small  enough 
to  ensure  that  (8.10)  is  applicable,  with  ax(bsrv)  <  1  +  e.  Then  si  can  be  chosen 
large,  according  to  (8.16),  so  that 


(8.19)  s1F<«(as!/2,  si)  >  (1  -  bsrk). 

This  inequality,  together  with  (8.18)  and  the  general  relation  between  V^iy,  s) 
and  V(y,  s),  establishes  that 
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(8.20) 


FxM/2,  si) 


^  V(asY2,  Sl)  (1  -  2 bark) 
“  Mi(bsp)  ( i-bsrk )' 


(0  <  a  <  ax(bsD). 


Finally,  since  the  last  ratio  here  is  <1  and  since  M\(bs0  ’’)  — >  1  as  s0  —>  the 
required  inequality  (8.15)  holds  when  s0  is  sufficiently  large. 


9.  A  formal  expansion  for  a  variation  of  the  control  problem 

The  formulation  of  the  spaceship  control  problem  treated  so  far  is  based  on 
assumptions,  concerning  the  rate  at  which  information  is  collected,  which  led 
to  the  function  D(y,  s )  =  s-1.  We  shall  indicate  how  the  stopping  cost 

(9.1)  D(y,  s)  =  s-1  +  a,  (a  >  0), 

can  arise  for  two  distinct  variations  of  these  assumptions  and  then  suggest 
formal  expansions  as  s  — » <x>  for  this  modified  problem. 

First  suppose,  as  in  section  2,  that  information  is  accumulated  at  the  rate 
air-2,  where  —  r  is  the  time  to  go.  Then  the  total  information  available  is 
s_1  =  —  air-1  +  (h  and  D(y,  s )  =  —  a3r_1.  Previously,  we  assumed  that  a2  =  0 
and  made  a  scale  transformation  to  obtain  the  stopping  cost.  If  we  do  not 
assume  a2  =  0,  the  transformation  leads  to  the  form  (9.1). 

An  alternative  assumption  is  that  information  is  collected  at  a  constant  rate 
1  /cr2.  Then  if  70  is  the  total  information  to  be  accumulated  by  the  time  r  =  0 
when  the  target  is  reached,  the  information  at  time  r  <  0  is  s-1  =  /0  +  o-~2r  <  J0. 
It  turns  out  that 

(9.2)  D( y,  s)  =  -o»r->  =  ’ 

for  s  >  Io1.  Once  again,  a  linear  transformation  of  the  y,  s  and  cost  coordinates 
can  be  found,  which  produces  the  form  (9.1)  without  affecting  the  basic  Wiener 
process. 

Apparently,  the  most  substantial  effect  of  changing  the  stopping  cost  to 
s_1  +  a  occurs  as  s  — » « .  We  shall  initiate  a  formal  expansion  of  the  type 
described  in  [4],  [6],  for  the  asymptotic  behavior  of  the  optimal  boundary 
y(s )  =  s1/2a(s). 

Let 

(9.3)  V(y,  s)  =  a  -  2a{l  -  4>(a)}  +  F{/(s1/2(a  +  Z))}, 

where  a  =  ys~112  and  $  is  the  standard  normal  distribution  function.  The  ex¬ 
pectation  is  to  operate  on  the  Taylor  expansion  of  /(s1/2(a  +  Z))  about  sll2a 
with  Z  distributed  as  91(0,  1).  Thus,  we  have  the  boundary  conditions 

(9.4)  -  +  a  =  a  —  2a{l  —  $(a)}  +/(s1/2d)  +  ^j/(2)(s1/2a) 
s 

+  +  •  •  •  , 

AV  <j®/2 

^1  =  0  =  2a<p(a)  +  s'l^Ks112*)  +  V/(3)(s1/2«)  +  •  •  •  . 
da 


(9.5) 
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A  first  approximation  to  the  functions  a  and  /  for  which  the  formal  expansion 
applies,  is  given  by 

(9.6)  <52(s)  =  2  log  s  +  log  log  s  +  log  (a2/ ir), 

(9.7)  f(x)  =  2ar"2log£2. 

Further  terms  can  be  obtained  by  the  techniques  used  in  [4],  [6].  Presumably, 
the  same  argument  would  apply  here;  that  is,  to  show  that  the  expansion  for 
a(s)  yields  a  valid  approximation  to  the  optimal  boundary  as  s  — > 

In  conclusion,  we  remark  that  (9.6)  indicates  the  asymptotic  form 

(9.8)  a(s )  V2  log  s,  (s  — ■>«), 

which  is  very  different  from  the  previous  case,  where  we  obtained  d(s)  — >  1  as 

S  — »  oc  . 
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